1
00:00:11,590 --> 00:00:09,180
[Music]

2
00:00:14,470 --> 00:00:11,600
all right so I want to thank everyone

3
00:00:17,230 --> 00:00:14,480
for the for being here

4
00:00:19,540 --> 00:00:17,240
I'm coming from the medical school at

5
00:00:20,859 --> 00:00:19,550
Rutgers University and I want to assure

6
00:00:22,300 --> 00:00:20,869
you that I actually have a sincere

7
00:00:24,550 --> 00:00:22,310
interest in the origins of life and I

8
00:00:29,050 --> 00:00:24,560
didn't just want a free trip to Japan

9
00:00:42,910 --> 00:00:29,060
and I've actually you know as part of

10
00:00:43,900 --> 00:00:42,920
the medical school as part of the

11
00:00:50,890 --> 00:00:43,910
medical school I don't know how to

12
00:00:52,150 --> 00:00:50,900
operate a computer let's see so as part

13
00:00:54,520 --> 00:00:52,160
of the medical school one of the things

14
00:00:57,100 --> 00:00:54,530
that I do is develop drugs and we're

15
00:00:59,230 --> 00:00:57,110
very interested in actually using

16
00:01:01,990 --> 00:00:59,240
structural modeling of proteins as an

17
00:01:04,359 --> 00:01:02,000
approach to designing new therapeutic

18
00:01:06,820 --> 00:01:04,369
molecules but at the same time you know

19
00:01:09,760 --> 00:01:06,830
my in my heart of hearts my fundamental

20
00:01:12,310 --> 00:01:09,770
interest is in origins of life and how

21
00:01:13,179 --> 00:01:12,320
proteins fold and how they evolve and

22
00:01:15,399 --> 00:01:13,189
one of the things that we've been

23
00:01:18,550 --> 00:01:15,409
interested in I've been studying now for

24
00:01:21,520 --> 00:01:18,560
over 15 years is the emergence of Homo

25
00:01:23,499 --> 00:01:21,530
chirality and I want to thank professor

26
00:01:25,599 --> 00:01:23,509
Whitesides for introducing this idea of

27
00:01:27,399 --> 00:01:25,609
the Leventhal it's paradox because for

28
00:01:30,309 --> 00:01:27,409
me protein folding is an important

29
00:01:33,190 --> 00:01:30,319
obstacle for for biomolecules to

30
00:01:34,300 --> 00:01:33,200
overcome and in origins and so one of

31
00:01:36,730 --> 00:01:34,310
the things that we were studying is

32
00:01:39,340 --> 00:01:36,740
using our computer simulations why our

33
00:01:41,050 --> 00:01:39,350
proteins homo chiral and what we find is

34
00:01:42,940 --> 00:01:41,060
that when sequences are homo chiral

35
00:01:45,129 --> 00:01:42,950
they're able to fold more quickly

36
00:01:46,629 --> 00:01:45,139
they're able to find their folded state

37
00:01:48,669 --> 00:01:46,639
in a much smaller phase space this is

38
00:01:50,080 --> 00:01:48,679
the phase space for example of a homo

39
00:01:51,910 --> 00:01:50,090
karl sequence whereas if your hetero

40
00:01:53,499 --> 00:01:51,920
karl you have to sample a lot more

41
00:01:55,330 --> 00:01:53,509
confirmations in order to find a unique

42
00:01:56,709 --> 00:01:55,340
state and so for us that was very

43
00:01:58,779 --> 00:01:56,719
interesting because with a very simple

44
00:02:00,520 --> 00:01:58,789
simulation we could see some very

45
00:02:02,499 --> 00:02:00,530
fundamental properties of molecules and

46
00:02:04,359 --> 00:02:02,509
then once we realized what we're sort of

47
00:02:06,760 --> 00:02:04,369
the guiding forces that were driving

48
00:02:08,529 --> 00:02:06,770
homo chirality we were then able to use

49
00:02:10,570 --> 00:02:08,539
that to sort of violate those principles

50
00:02:13,720 --> 00:02:10,580
and start to design small peptides for

51
00:02:16,240 --> 00:02:13,730
example this one this is a peptide made

52
00:02:17,830 --> 00:02:16,250
out of all L amino acids except for the

53
00:02:19,929 --> 00:02:17,840
NNC term and I have the peptide where

54
00:02:21,280 --> 00:02:19,939
there are D amino acids that cap the

55
00:02:22,030 --> 00:02:21,290
ends of the peptides and hold them in a

56
00:02:23,800 --> 00:02:22,040
fold it's

57
00:02:25,750 --> 00:02:23,810
and this peptide even though it's only a

58
00:02:27,520 --> 00:02:25,760
Timmy no assets long is very stable it's

59
00:02:29,590 --> 00:02:27,530
able to find its native state very

60
00:02:31,150 --> 00:02:29,600
quickly and very efficiently and then

61
00:02:33,160 --> 00:02:31,160
block an interaction that's key in the

62
00:02:35,260 --> 00:02:33,170
life cycle of the the influenza virus

63
00:02:38,110 --> 00:02:35,270
and we're now developing this as a

64
00:02:39,700 --> 00:02:38,120
potential therapeutic peptide so even

65
00:02:41,290 --> 00:02:39,710
though I'm at the medical school and I'm

66
00:02:43,510 --> 00:02:41,300
studying origins of life there really is

67
00:02:47,110 --> 00:02:43,520
a nice connection between these two

68
00:02:49,780 --> 00:02:47,120
sides of my life and so when I first

69
00:02:53,020 --> 00:02:49,790
came to the Medical School about 11 or

70
00:02:55,600 --> 00:02:53,030
12 years ago I was very excited to to

71
00:02:57,520 --> 00:02:55,610
meet Paul Falkowski because he got me

72
00:02:59,530 --> 00:02:57,530
engaged in another origins of life

73
00:03:01,270 --> 00:02:59,540
problem different from homo chirality

74
00:03:03,070 --> 00:03:01,280
and so this is the project I'm going to

75
00:03:06,730 --> 00:03:03,080
talk to you about today it's primarily

76
00:03:10,170 --> 00:03:06,740
the the work of a postdoc in the

77
00:03:13,120 --> 00:03:10,180
laboratory a guy a guy came to us from a

78
00:03:15,070 --> 00:03:13,130
microbial physiology lab in Israel and

79
00:03:16,450 --> 00:03:15,080
he very quickly became a structural

80
00:03:18,640 --> 00:03:16,460
biologist so I'm very proud of the work

81
00:03:20,770 --> 00:03:18,650
that he's done and this is also part of

82
00:03:23,170 --> 00:03:20,780
a larger team Douglas Pike a valuable

83
00:03:25,120 --> 00:03:23,180
graduate student in the lab postdocs Eli

84
00:03:27,210 --> 00:03:25,130
Moore and Stephan sin and in also

85
00:03:29,620 --> 00:03:27,220
collaborations with Yana Bromberg who

86
00:03:31,150 --> 00:03:29,630
you've heard about some of her work

87
00:03:35,910 --> 00:03:31,160
earlier and then support from a number

88
00:03:39,040 --> 00:03:35,920
of foundations and and associations so

89
00:03:41,650 --> 00:03:39,050
Paul when I first arted talking to him

90
00:03:44,740 --> 00:03:41,660
about this essentially was it was very

91
00:03:47,850 --> 00:03:44,750
interested in the origins of these

92
00:03:49,930 --> 00:03:47,860
complex protein nanomachines these

93
00:03:54,270 --> 00:03:49,940
oxidoreductases that were capable of

94
00:03:56,830 --> 00:03:54,280
doing very very efficiently

95
00:03:59,260 --> 00:03:56,840
sophisticated a couple electron transfer

96
00:04:00,580 --> 00:03:59,270
catalysis and these we call these

97
00:04:02,740 --> 00:04:00,590
proteins nano machines because they are

98
00:04:05,710 --> 00:04:02,750
massive macromolecular complexes and

99
00:04:08,290 --> 00:04:05,720
these are critical reactions and at some

100
00:04:09,490 --> 00:04:08,300
point you know they were they must have

101
00:04:11,710 --> 00:04:09,500
emerged because they were critical for

102
00:04:13,240 --> 00:04:11,720
for lies processes but we couldn't

103
00:04:15,130 --> 00:04:13,250
imagine something like a nitrogenase

104
00:04:17,140 --> 00:04:15,140
climbing out of the primordial soup on

105
00:04:19,599 --> 00:04:17,150
its own there had to be some kind of a

106
00:04:21,580 --> 00:04:19,609
simpler ancestor and we're really

107
00:04:23,770 --> 00:04:21,590
interested in what these ancestors look

108
00:04:25,930 --> 00:04:23,780
like and maybe as you walk back they may

109
00:04:27,850 --> 00:04:25,940
even have been small peptides in the

110
00:04:30,310 --> 00:04:27,860
earlier Keyon or late hid Ian that had

111
00:04:32,980 --> 00:04:30,320
similar function that sort of themselves

112
00:04:35,590 --> 00:04:32,990
emerge from the the primordial soup but

113
00:04:37,030 --> 00:04:35,600
as we all know the problem is we

114
00:04:39,160 --> 00:04:37,040
don't have any information about what

115
00:04:41,530 --> 00:04:39,170
these molecules look like there are no

116
00:04:44,050 --> 00:04:41,540
molecular fossils for what happened at

117
00:04:46,960 --> 00:04:44,060
these early stages all we have are the

118
00:04:48,670 --> 00:04:46,970
the extant molecules and then some

119
00:04:50,020 --> 00:04:48,680
geological information some

120
00:04:52,540 --> 00:04:50,030
mineralogical information about what the

121
00:04:54,730 --> 00:04:52,550
rocks may have look like at that time so

122
00:04:58,710 --> 00:04:54,740
the question is how can we sort of walk

123
00:05:01,990 --> 00:04:58,720
back the common ancestors of these much

124
00:05:04,210 --> 00:05:02,000
these modern extant complex proteins and

125
00:05:09,970 --> 00:05:04,220
figure out what these original molecules

126
00:05:12,190 --> 00:05:09,980
may look like so uh you know I call

127
00:05:13,390 --> 00:05:12,200
these things uh massive nano machines

128
00:05:15,220 --> 00:05:13,400
because when you look at some of these

129
00:05:17,440 --> 00:05:15,230
enzymes they really are incredibly

130
00:05:19,030 --> 00:05:17,450
complex right so these are you know

131
00:05:20,980 --> 00:05:19,040
they're they're doing critical reactions

132
00:05:24,040 --> 00:05:20,990
that you know as we've heard for many of

133
00:05:27,370 --> 00:05:24,050
the talks throughout the session they

134
00:05:29,980 --> 00:05:27,380
take advantage of this extant

135
00:05:32,860 --> 00:05:29,990
disequilibrium and on the planet and

136
00:05:34,450 --> 00:05:32,870
it's a great source of energy but you

137
00:05:37,570 --> 00:05:34,460
can't imagine you know a protein like

138
00:05:40,690 --> 00:05:37,580
this emerging spontaneously and again

139
00:05:42,580 --> 00:05:40,700
this is a manifestation of the the

140
00:05:45,190 --> 00:05:42,590
Leventhal paradox so you know think

141
00:05:46,210 --> 00:05:45,200
about something much simpler a protein

142
00:05:48,670 --> 00:05:46,220
that's maybe more medically relevant

143
00:05:51,520 --> 00:05:48,680
this is the the beta chain of insulin

144
00:05:53,980 --> 00:05:51,530
it's only about 30 amino acids right and

145
00:05:56,110 --> 00:05:53,990
so we know that forward and before for a

146
00:05:58,750 --> 00:05:56,120
protein to fold it has to go from this

147
00:06:01,500 --> 00:05:58,760
unfolded state into a single native

148
00:06:04,660 --> 00:06:01,510
state and each amino acid in this chain

149
00:06:05,980 --> 00:06:04,670
has two rotatable bonds and if you think

150
00:06:07,810 --> 00:06:05,990
about sort of the staggered eclipsed

151
00:06:09,850 --> 00:06:07,820
confirmations of bonds you could

152
00:06:11,620 --> 00:06:09,860
minimally have ten confirmations per

153
00:06:13,960 --> 00:06:11,630
amino acid so you're talking about a

154
00:06:16,570 --> 00:06:13,970
total of combinatoric Li a total of

155
00:06:18,310 --> 00:06:16,580
about 10 to the 30th power unfolded

156
00:06:20,140 --> 00:06:18,320
confirmations out of which it finds one

157
00:06:22,090 --> 00:06:20,150
folded state so even for something as

158
00:06:25,750 --> 00:06:22,100
small as the beta peptide of insulin

159
00:06:28,840 --> 00:06:25,760
this is a massive problem if you think

160
00:06:31,810 --> 00:06:28,850
about the the other sort of side of the

161
00:06:34,480 --> 00:06:31,820
xi pause paradox now this is how did

162
00:06:36,190 --> 00:06:34,490
that particular protein evolve you can

163
00:06:37,630 --> 00:06:36,200
also see that this is a massive common

164
00:06:39,700 --> 00:06:37,640
in toriel problem so even for something

165
00:06:42,130 --> 00:06:39,710
this small how do you find a specific

166
00:06:44,200 --> 00:06:42,140
sequence that has this function from the

167
00:06:46,180 --> 00:06:44,210
20 to the 30th power unique sequences

168
00:06:48,700 --> 00:06:46,190
right so even if the functional

169
00:06:49,480 --> 00:06:48,710
footprint of something that acts like a

170
00:06:51,520 --> 00:06:49,490
beta

171
00:06:53,140 --> 00:06:51,530
insulin peptide is let's say there's a

172
00:06:54,610 --> 00:06:53,150
billion sequences that could do that or

173
00:06:59,439 --> 00:06:54,620
a trillion sequences that could do that

174
00:07:02,409 --> 00:06:59,449
you still have a huge phase space to to

175
00:07:03,670 --> 00:07:02,419
to search in order to find a sequence

176
00:07:05,050 --> 00:07:03,680
that's going to have this function and

177
00:07:07,390 --> 00:07:05,060
this is something that's only thirty

178
00:07:10,300 --> 00:07:07,400
amino acids long if you look at

179
00:07:13,870 --> 00:07:10,310
something like nitrogenase at rajan YZ

180
00:07:17,350 --> 00:07:13,880
is about 2500 amino acids and so how

181
00:07:19,150 --> 00:07:17,360
does something this complex evolve when

182
00:07:21,040 --> 00:07:19,160
you have such a massive conformational

183
00:07:24,010 --> 00:07:21,050
space to search in such a massive

184
00:07:27,210 --> 00:07:24,020
sequence base and the answer you know

185
00:07:29,020 --> 00:07:27,220
this is not something that's new to

186
00:07:31,719 --> 00:07:29,030
oxidoreductases are new to this this

187
00:07:33,040 --> 00:07:31,729
particular this particular project but

188
00:07:35,529 --> 00:07:33,050
we know that proteins really are

189
00:07:36,310 --> 00:07:35,539
assembled by much smaller domains and so

190
00:07:38,379 --> 00:07:36,320
there was a couple of ways that

191
00:07:40,270 --> 00:07:38,389
nitrogenase solves this problem one is

192
00:07:41,680 --> 00:07:40,280
that there are multiple proteins that

193
00:07:44,560 --> 00:07:41,690
associate to form the macromolecular

194
00:07:45,820 --> 00:07:44,570
complex there's symmetry that a lot of

195
00:07:48,939 --> 00:07:45,830
proteins take advantage of so you can

196
00:07:52,149 --> 00:07:48,949
double or triple or or multiply your

197
00:07:54,700 --> 00:07:52,159
your complexity just by taking advantage

198
00:07:56,050 --> 00:07:54,710
of symmetric transformations and then

199
00:07:57,969 --> 00:07:56,060
the problem becomes a lot simpler now

200
00:08:00,399 --> 00:07:57,979
you think about how did these individual

201
00:08:02,620 --> 00:08:00,409
modules evolve and the phase space that

202
00:08:04,870 --> 00:08:02,630
they have to search is significantly

203
00:08:07,120 --> 00:08:04,880
smaller so I say that the solution is

204
00:08:09,010 --> 00:08:07,130
you know let's identify what these these

205
00:08:11,560 --> 00:08:09,020
individual building blocks are and they

206
00:08:13,570 --> 00:08:11,570
would be a much simpler thing to imagine

207
00:08:16,719 --> 00:08:13,580
evolving but that's not a very easy

208
00:08:19,540 --> 00:08:16,729
thing to do right we can't just sort of

209
00:08:20,860 --> 00:08:19,550
carve out pieces of nitrogen's and say

210
00:08:23,140 --> 00:08:20,870
that this piece of Auld and then this

211
00:08:25,209 --> 00:08:23,150
piece of all then these are our modules

212
00:08:26,680 --> 00:08:25,219
this identification of these ancient

213
00:08:29,140 --> 00:08:26,690
building blocks is itself a very

214
00:08:30,640 --> 00:08:29,150
difficult problem and then also once you

215
00:08:32,319 --> 00:08:30,650
identify what these domains are what

216
00:08:33,760 --> 00:08:32,329
these modules are I think another

217
00:08:35,740 --> 00:08:33,770
critical problem is figuring out how did

218
00:08:37,800 --> 00:08:35,750
they self assemble how did they they

219
00:08:44,769 --> 00:08:37,810
aggregate to make these more complex

220
00:08:46,090 --> 00:08:44,779
functional molecules so the way that we

221
00:08:47,170 --> 00:08:46,100
approached this problem was to say you

222
00:08:48,699 --> 00:08:47,180
know if you look at something like an

223
00:08:52,150 --> 00:08:48,709
oxidoreductase so this is fumarate

224
00:08:53,560 --> 00:08:52,160
reductase for example let's ignore most

225
00:08:55,329 --> 00:08:53,570
of the protein and let's really just

226
00:08:56,829 --> 00:08:55,339
look at the important part of this

227
00:08:59,350 --> 00:08:56,839
protein the one that's involved an

228
00:09:01,750 --> 00:08:59,360
electron transfer and electron transfer

229
00:09:03,310 --> 00:09:01,760
is really just mediated by this chain of

230
00:09:05,620 --> 00:09:03,320
metals this little necklace of

231
00:09:09,540 --> 00:09:05,630
that's running through the center of the

232
00:09:12,250 --> 00:09:09,550
the protein core to its active site and

233
00:09:13,870 --> 00:09:12,260
if we argue that these are really the

234
00:09:15,880 --> 00:09:13,880
the functionally important parts of the

235
00:09:17,800 --> 00:09:15,890
protein then we would imagine that the

236
00:09:19,870 --> 00:09:17,810
fundamental modules that assemble into

237
00:09:21,760 --> 00:09:19,880
these larger molecules must be centered

238
00:09:24,280 --> 00:09:21,770
around these metals so essentially what

239
00:09:25,990 --> 00:09:24,290
we did was we went into a database

240
00:09:27,940 --> 00:09:26,000
called a protein databank and for those

241
00:09:31,060 --> 00:09:27,950
of you who are not a familiar with this

242
00:09:32,950 --> 00:09:31,070
this data set the the PDB is a

243
00:09:35,170 --> 00:09:32,960
repository for the high-resolution

244
00:09:38,260 --> 00:09:35,180
structures of protein so these are

245
00:09:41,260 --> 00:09:38,270
atomic resolution structures of proteins

246
00:09:42,310 --> 00:09:41,270
not just oxidoreductases but all kinds

247
00:09:45,460 --> 00:09:42,320
of different proteins you know

248
00:09:47,500 --> 00:09:45,470
hemoglobin and and so forth and there's

249
00:09:49,480 --> 00:09:47,510
over a hundred thousand different

250
00:09:51,720 --> 00:09:49,490
proteins that have been deposited in the

251
00:09:55,990 --> 00:09:51,730
PDB XI now I think closer to about

252
00:09:58,450 --> 00:09:56,000
150,000 and of these about 10,000 or so

253
00:09:59,740 --> 00:09:58,460
have metal centers in them and what we

254
00:10:02,470 --> 00:09:59,750
did was we essentially took all of those

255
00:10:03,940 --> 00:10:02,480
proteins and we looked at what we call a

256
00:10:05,950 --> 00:10:03,950
micro environment which is essentially

257
00:10:07,300 --> 00:10:05,960
just the amino acids that are within a

258
00:10:08,950 --> 00:10:07,310
certain distance of the metal center

259
00:10:11,590 --> 00:10:08,960
which we think is the important part of

260
00:10:13,420 --> 00:10:11,600
the protein for electron transfer and we

261
00:10:15,280 --> 00:10:13,430
just we just excavated all of these out

262
00:10:17,170 --> 00:10:15,290
of these proteins so we had about 30,000

263
00:10:19,600 --> 00:10:17,180
micro environments and then we tried to

264
00:10:22,270 --> 00:10:19,610
classify these into a smaller set of

265
00:10:23,410 --> 00:10:22,280
modules based on their metal type and

266
00:10:28,600 --> 00:10:23,420
then also based on some sort of

267
00:10:30,610 --> 00:10:28,610
structural similarity and you know for

268
00:10:32,080 --> 00:10:30,620
those of you who have done comparative

269
00:10:34,600 --> 00:10:32,090
structural analysis of proteins you know

270
00:10:36,280 --> 00:10:34,610
that this is not a trivial thing to do

271
00:10:37,990 --> 00:10:36,290
particularly when you're looking at

272
00:10:40,390 --> 00:10:38,000
alignments that are of sort of

273
00:10:43,300 --> 00:10:40,400
intermediate quality so for example what

274
00:10:45,730 --> 00:10:43,310
I'm showing you here these are two two

275
00:10:48,370 --> 00:10:45,740
modules that are centered around an iron

276
00:10:49,600 --> 00:10:48,380
sulfur cluster these are about 15

277
00:10:51,820 --> 00:10:49,610
angstroms and radius we're essentially

278
00:10:54,670 --> 00:10:51,830
looking at all the amino acids that are

279
00:10:56,440 --> 00:10:54,680
within 15 and 15 angstroms of that metal

280
00:10:58,210 --> 00:10:56,450
center and we're what we want to ask

281
00:11:00,820 --> 00:10:58,220
whether they are they have similar

282
00:11:03,250 --> 00:11:00,830
protein structure holding the the metal

283
00:11:04,810 --> 00:11:03,260
in place and these are two examples of

284
00:11:06,520 --> 00:11:04,820
the types of alignments we could get and

285
00:11:08,140 --> 00:11:06,530
on this axis right here we have a

286
00:11:10,120 --> 00:11:08,150
similarity score this is essentially

287
00:11:13,540 --> 00:11:10,130
telling us how well do the atoms of

288
00:11:15,910 --> 00:11:13,550
those two environments align and you can

289
00:11:17,049 --> 00:11:15,920
see here in this alignment here the red

290
00:11:18,459 --> 00:11:17,059
and orange parts

291
00:11:19,719 --> 00:11:18,469
these are the parts that align you can

292
00:11:22,239 --> 00:11:19,729
really see only a little bit of the

293
00:11:23,969 --> 00:11:22,249
structural lines right here and then

294
00:11:26,469 --> 00:11:23,979
here's another alignment between two

295
00:11:28,569 --> 00:11:26,479
modules and you can see again only maybe

296
00:11:30,609 --> 00:11:28,579
about 20-30 percent of those two

297
00:11:33,009 --> 00:11:30,619
structures align and so they have about

298
00:11:34,509 --> 00:11:33,019
the same similarity score and so if we

299
00:11:36,579 --> 00:11:34,519
were just doing standard structural

300
00:11:37,899 --> 00:11:36,589
alignment tools we would not be able to

301
00:11:38,889 --> 00:11:37,909
say which one is a good alignment which

302
00:11:42,039 --> 00:11:38,899
one is the battle I mean they're both

303
00:11:44,469 --> 00:11:42,049
sort of on the edge of being an

304
00:11:46,239 --> 00:11:44,479
acceptable alignment but what we know is

305
00:11:48,189 --> 00:11:46,249
that in order for these these domains to

306
00:11:49,359 --> 00:11:48,199
function they must contain this metal

307
00:11:52,059 --> 00:11:49,369
centre and this metal centre is really

308
00:11:54,609 --> 00:11:52,069
the sort of the functional nucleus of

309
00:11:57,549 --> 00:11:54,619
these these modules so really those

310
00:11:58,869 --> 00:11:57,559
those metal centers must also align so

311
00:12:02,259 --> 00:11:58,879
we're using the metal centre essentially

312
00:12:04,089 --> 00:12:02,269
as a fiducial marker to tell us how good

313
00:12:06,579 --> 00:12:04,099
our alignments are so even though these

314
00:12:07,779 --> 00:12:06,589
have very similar scores here the metals

315
00:12:09,609 --> 00:12:07,789
are right on top of each other and we

316
00:12:10,779 --> 00:12:09,619
believe this alignment and here they're

317
00:12:13,029 --> 00:12:10,789
far apart from each other and we don't

318
00:12:14,409 --> 00:12:13,039
so this was a real breakthrough for us

319
00:12:16,479 --> 00:12:14,419
because it allowed us then to do this

320
00:12:18,399 --> 00:12:16,489
structure structure comparison a large

321
00:12:19,869 --> 00:12:18,409
scale and not have to go through and

322
00:12:23,399 --> 00:12:19,879
analyze each one manually and figure out

323
00:12:25,569 --> 00:12:23,409
whether we believe the alignment or not

324
00:12:28,509 --> 00:12:25,579
and so we did this for all of these

325
00:12:30,429 --> 00:12:28,519
30,000 modules and what we found is that

326
00:12:32,949 --> 00:12:30,439
there were about let's say a thousand

327
00:12:35,199 --> 00:12:32,959
different modules and we could cluster

328
00:12:37,389 --> 00:12:35,209
all of these into these these different

329
00:12:39,789 --> 00:12:37,399
classes and that number a thousand is

330
00:12:41,889 --> 00:12:39,799
not an exact number depending on what

331
00:12:43,179 --> 00:12:41,899
your threshold is for similarity you can

332
00:12:45,009 --> 00:12:43,189
make it larger you can make it smaller

333
00:12:48,339 --> 00:12:45,019
but we do find that there are a couple

334
00:12:49,899 --> 00:12:48,349
of domains that are that are a couple of

335
00:12:51,609 --> 00:12:49,909
modules that have a lot of members and

336
00:12:53,679 --> 00:12:51,619
within these we have the ferredoxin

337
00:12:55,539 --> 00:12:53,689
which is not surprising the cytochrome C

338
00:12:57,609 --> 00:12:55,549
but then also a copper binding

339
00:12:59,309 --> 00:12:57,619
plastocyanin and then a four helix

340
00:13:02,079 --> 00:12:59,319
bundle that could either bind one or two

341
00:13:03,909 --> 00:13:02,089
metal ions in the center so it looks

342
00:13:06,249 --> 00:13:03,919
like we have for example we have here a

343
00:13:12,549 --> 00:13:06,259
couple of Legos that are commonly used

344
00:13:15,369 --> 00:13:12,559
in in metalloproteins now what we're

345
00:13:16,689 --> 00:13:15,379
looking at here this is essentially you

346
00:13:18,669 --> 00:13:16,699
know the way that we're defining these

347
00:13:20,139 --> 00:13:18,679
micro environments is the metal in the

348
00:13:22,389 --> 00:13:20,149
center and then we're sort of carving

349
00:13:24,519 --> 00:13:22,399
out amino acids that are within 15

350
00:13:26,619 --> 00:13:24,529
angstroms that metal and sometimes this

351
00:13:29,349 --> 00:13:26,629
is a discontinuous piece of the protein

352
00:13:31,030 --> 00:13:29,359
may have loops that are going out to

353
00:13:33,310 --> 00:13:31,040
another domain or maybe this part of the

354
00:13:34,269 --> 00:13:33,320
proteins coming from one chain and this

355
00:13:36,009 --> 00:13:34,279
part of the proteins coming from another

356
00:13:37,569 --> 00:13:36,019
part of the chain so why would you

357
00:13:40,090 --> 00:13:37,579
believe that this is actually a relevant

358
00:13:42,400 --> 00:13:40,100
module for for evolution so one of the

359
00:13:44,920 --> 00:13:42,410
things that we noticed was that when we

360
00:13:46,269 --> 00:13:44,930
look at the size distribution of these

361
00:13:49,389 --> 00:13:46,279
modules and we put them on a log-log

362
00:13:52,300 --> 00:13:49,399
plot we see that they have a semi linear

363
00:13:55,300 --> 00:13:52,310
relationship and that is consistent with

364
00:13:57,819 --> 00:13:55,310
a model of domain evolution where you

365
00:13:59,410 --> 00:13:57,829
have duplication of these modules so you

366
00:14:00,879 --> 00:13:59,420
can imagine that you have old modules

367
00:14:03,970 --> 00:14:00,889
that have been around for a long time

368
00:14:05,710 --> 00:14:03,980
and they Vedad a long time to duplicate

369
00:14:07,990 --> 00:14:05,720
within genomes and so there's a lot of

370
00:14:09,999 --> 00:14:08,000
copies of those and then you have at the

371
00:14:11,860 --> 00:14:10,009
same time innovation you have new

372
00:14:13,900 --> 00:14:11,870
domains that are being invented and

373
00:14:15,939 --> 00:14:13,910
those exist at you know at a much

374
00:14:18,180 --> 00:14:15,949
smaller fraction and so when you have

375
00:14:20,829 --> 00:14:18,190
this sort of a process of domain

376
00:14:23,160 --> 00:14:20,839
innovation and then duplication you get

377
00:14:27,069 --> 00:14:23,170
this sort of linear relationship of

378
00:14:29,410 --> 00:14:27,079
module size versus frequency on a

379
00:14:31,210 --> 00:14:29,420
log-log plot so this made this gave us

380
00:14:33,970 --> 00:14:31,220
some confidence that even though we are

381
00:14:35,980 --> 00:14:33,980
sort of creating these shaved pieces of

382
00:14:37,949 --> 00:14:35,990
proteins that this was a functionally

383
00:14:40,629 --> 00:14:37,959
relevant and evolutionarily selectable

384
00:14:44,790 --> 00:14:40,639
domain that we could then think about in

385
00:14:47,920 --> 00:14:44,800
terms of its its functional consequences

386
00:14:49,480 --> 00:14:47,930
so now the question is some of these

387
00:14:53,590 --> 00:14:49,490
domains for example like cytochrome C

388
00:14:54,850 --> 00:14:53,600
contain hundreds or close to a thousand

389
00:14:57,280 --> 00:14:54,860
different micro environments extracted

390
00:14:59,949 --> 00:14:57,290
from the PDB am I saying that all of

391
00:15:02,230 --> 00:14:59,959
these modules have a common origin

392
00:15:05,590 --> 00:15:02,240
they're all came from a from an ERV

393
00:15:06,610 --> 00:15:05,600
cytochrome C type domain and now we're

394
00:15:08,920 --> 00:15:06,620
seeing them occurring in all these

395
00:15:10,900 --> 00:15:08,930
different proteins well that's that's

396
00:15:12,819 --> 00:15:10,910
pretty hard to believe so for example

397
00:15:15,490 --> 00:15:12,829
here we're looking at just one of these

398
00:15:17,559 --> 00:15:15,500
sets of modules so each of these dots

399
00:15:20,199 --> 00:15:17,569
represents one microenvironment from a

400
00:15:22,300 --> 00:15:20,209
specific protein and an edge represents

401
00:15:23,679 --> 00:15:22,310
two micro environments that are that

402
00:15:25,990 --> 00:15:23,689
have an acceptable alignment to each

403
00:15:28,540 --> 00:15:26,000
other and so for example to get from

404
00:15:30,699 --> 00:15:28,550
this micro environment right here from

405
00:15:32,319 --> 00:15:30,709
one protein to this micro environment

406
00:15:34,179 --> 00:15:32,329
right here from another protein we would

407
00:15:35,499 --> 00:15:34,189
have to go through minimally 12

408
00:15:37,059 --> 00:15:35,509
different intermediates to get from

409
00:15:39,220 --> 00:15:37,069
there to there and we're not just

410
00:15:41,679 --> 00:15:39,230
staying within prokaryotes who may be

411
00:15:43,870 --> 00:15:41,689
going in between to a eukaryotic module

412
00:15:46,410 --> 00:15:43,880
and then back to a prokaryotic module so

413
00:15:48,519 --> 00:15:46,420
clearly these are not

414
00:15:51,040 --> 00:15:48,529
convincing evolutionary trajectories

415
00:15:52,150 --> 00:15:51,050
based on structural similarity alone so

416
00:15:53,639 --> 00:15:52,160
another way of thinking about this I

417
00:15:56,500 --> 00:15:53,649
think about are we discriminating

418
00:15:57,850 --> 00:15:56,510
homology versus analogy and really what

419
00:16:00,430 --> 00:15:57,860
I'm saying is that for example within a

420
00:16:02,769 --> 00:16:00,440
particular module like the ferredoxin or

421
00:16:04,449 --> 00:16:02,779
the cytochrome C with similarity alone

422
00:16:06,759 --> 00:16:04,459
we don't know if we were looking at all

423
00:16:08,710 --> 00:16:06,769
bird wings or whether there's bat wings

424
00:16:11,290 --> 00:16:08,720
and butterfly wings mixed in to this

425
00:16:13,030 --> 00:16:11,300
data set so that's an important thing to

426
00:16:15,819 --> 00:16:13,040
keep in mind that structural similarity

427
00:16:19,949 --> 00:16:15,829
itself is not sufficient to to prove

428
00:16:26,769 --> 00:16:24,759
okay so we have a thousand or so modules

429
00:16:29,380 --> 00:16:26,779
that are being used to build these more

430
00:16:31,269 --> 00:16:29,390
complex proteins now you know what's

431
00:16:33,490 --> 00:16:31,279
really interesting we want to understand

432
00:16:34,540 --> 00:16:33,500
the emergence of complexity within these

433
00:16:38,829 --> 00:16:34,550
systems if we want to say are there any

434
00:16:40,480 --> 00:16:38,839
rules that can guide how these modules

435
00:16:42,430 --> 00:16:40,490
are connected together can we figure out

436
00:16:45,130 --> 00:16:42,440
the rules for how these these legos are

437
00:16:46,960 --> 00:16:45,140
assembled and for this we take advantage

438
00:16:48,880 --> 00:16:46,970
of the fact that we are really

439
00:16:50,050 --> 00:16:48,890
interested in electron transfer

440
00:16:53,230 --> 00:16:50,060
you know we're interested in the ability

441
00:16:56,250 --> 00:16:53,240
of proteins to to move electrons from

442
00:17:00,010 --> 00:16:56,260
one side of the protein to another from

443
00:17:01,150 --> 00:17:00,020
from a an active side to a to a

444
00:17:04,179 --> 00:17:01,160
different part of the protein and

445
00:17:06,250 --> 00:17:04,189
they're what we can take advantage of is

446
00:17:08,829 --> 00:17:06,260
that since we have the high resolution

447
00:17:10,449 --> 00:17:08,839
structures for all of these proteins we

448
00:17:12,909 --> 00:17:10,459
know the distance between each of the

449
00:17:15,850 --> 00:17:12,919
metal cofactors and there was a very

450
00:17:19,590 --> 00:17:15,860
important and influential study from Les

451
00:17:23,350 --> 00:17:19,600
Sutton's lab in the early 2000s where

452
00:17:25,449 --> 00:17:23,360
they looked at a set of oxidoreductases

453
00:17:27,970 --> 00:17:25,459
and they looked at the distances between

454
00:17:30,280 --> 00:17:27,980
pairs of metal cofactors that were

455
00:17:32,590 --> 00:17:30,290
involved in electron transfer and what

456
00:17:34,690 --> 00:17:32,600
you can see here is on this plot here

457
00:17:37,180 --> 00:17:34,700
you have the distance on this axis

458
00:17:38,770 --> 00:17:37,190
between two metal sites within a within

459
00:17:40,930 --> 00:17:38,780
a protein in an electron transport chain

460
00:17:44,159 --> 00:17:40,940
and then on this axis right here the log

461
00:17:46,780 --> 00:17:44,169
of the electron transfer rate right and

462
00:17:49,750 --> 00:17:46,790
for all electron transport chains within

463
00:17:51,580 --> 00:17:49,760
proteins the metal cofactors are at most

464
00:17:53,080 --> 00:17:51,590
found with by fourteen angstroms away

465
00:17:54,070 --> 00:17:53,090
from each other and at fourteen

466
00:17:56,680 --> 00:17:54,080
angstroms you're now thinking about

467
00:17:59,050 --> 00:17:56,690
electron transfer rates on the on the

468
00:18:00,730 --> 00:17:59,060
scale of microseconds any further than

469
00:18:02,800 --> 00:18:00,740
that then these these the electron

470
00:18:06,010 --> 00:18:02,810
transfer rates become too slow to really

471
00:18:07,750 --> 00:18:06,020
be biologically relevant and so what we

472
00:18:10,120 --> 00:18:07,760
said was that well if we were interested

473
00:18:13,450 --> 00:18:10,130
in electron transport chains we can then

474
00:18:15,550 --> 00:18:13,460
just look for modules where the distance

475
00:18:17,740 --> 00:18:15,560
between cofactors falls within this this

476
00:18:18,940 --> 00:18:17,750
distance cutoff so really what we're

477
00:18:21,100 --> 00:18:18,950
doing now is we're going through the

478
00:18:22,390 --> 00:18:21,110
same data set of proteins and now

479
00:18:24,670 --> 00:18:22,400
instead of collect connecting

480
00:18:26,770 --> 00:18:24,680
microenvironments based on structural

481
00:18:28,540 --> 00:18:26,780
similarity we're connecting them based

482
00:18:30,370 --> 00:18:28,550
on their spatial adjacency within a

483
00:18:32,350 --> 00:18:30,380
protein so we can say that for example

484
00:18:34,300 --> 00:18:32,360
this type of ferredoxin domain or this

485
00:18:35,800 --> 00:18:34,310
type of iron-sulfur domain is often

486
00:18:37,690 --> 00:18:35,810
found next to a molybdenum site

487
00:18:40,510 --> 00:18:37,700
this type of heme domain for example is

488
00:18:42,760 --> 00:18:40,520
often found next to a an iron sulfur

489
00:18:44,950 --> 00:18:42,770
site and we can build this map of

490
00:18:50,020 --> 00:18:44,960
spatial connectivity within

491
00:18:53,650 --> 00:18:50,030
oxidoreductases so we did this and this

492
00:18:55,950 --> 00:18:53,660
is what we got so there are a lot of

493
00:18:58,360 --> 00:18:55,960
interesting things about this network

494
00:19:01,780 --> 00:18:58,370
what I'm showing you here each of these

495
00:19:04,230 --> 00:19:01,790
nodes now is not a specific protein site

496
00:19:06,160 --> 00:19:04,240
it's a module so it's the collection of

497
00:19:08,740 --> 00:19:06,170
micro environments that all have

498
00:19:10,870 --> 00:19:08,750
structural similarity the size of the

499
00:19:12,550 --> 00:19:10,880
node represents the number of

500
00:19:14,650 --> 00:19:12,560
connections it makes with other types of

501
00:19:16,960 --> 00:19:14,660
modules so these are connections these

502
00:19:18,940 --> 00:19:16,970
edges are two other modules that are

503
00:19:22,420 --> 00:19:18,950
beyond our threshold for structural

504
00:19:24,910 --> 00:19:22,430
similarity and then the edges themselves

505
00:19:27,070 --> 00:19:24,920
represent a the thickness of the edges

506
00:19:29,740 --> 00:19:27,080
represents the number of instances of a

507
00:19:31,000 --> 00:19:29,750
particular connection that we see and we

508
00:19:33,580 --> 00:19:31,010
can see here that those same four

509
00:19:37,030 --> 00:19:33,590
modules that were highly represented in

510
00:19:39,280 --> 00:19:37,040
the the data set there also are they

511
00:19:42,220 --> 00:19:39,290
make a large number of connections with

512
00:19:44,470 --> 00:19:42,230
other types of modules in the in this

513
00:19:48,640 --> 00:19:44,480
spatial adjacency Network within the

514
00:19:52,690 --> 00:19:48,650
span and so what rules can we get for

515
00:19:55,300 --> 00:19:52,700
the assembly of electron transport

516
00:19:57,400 --> 00:19:55,310
chains from looking at this well one of

517
00:20:01,120 --> 00:19:57,410
the things that we noticed for about 30%

518
00:20:03,580 --> 00:20:01,130
of module module connections we have

519
00:20:05,710 --> 00:20:03,590
instead of connecting one type of module

520
00:20:07,240 --> 00:20:05,720
to another we had these loops and so

521
00:20:10,270 --> 00:20:07,250
what a loop here represents essentially

522
00:20:10,690 --> 00:20:10,280
is in a ferredoxin type module connected

523
00:20:12,850 --> 00:20:10,700
to another

524
00:20:15,250 --> 00:20:12,860
ferredoxin Taekwon jewel or a cytochrome

525
00:20:16,660 --> 00:20:15,260
C connected to another cytochrome C or a

526
00:20:19,150 --> 00:20:16,670
rubra dachshund connected to another

527
00:20:21,840 --> 00:20:19,160
rubra dachshund and so you know again

528
00:20:23,530 --> 00:20:21,850
this is not something that is new to

529
00:20:25,090 --> 00:20:23,540
oxidoreductases this is something that

530
00:20:27,970 --> 00:20:25,100
you classically see in a lot of

531
00:20:29,740 --> 00:20:27,980
different multi-domain proteins is that

532
00:20:31,240 --> 00:20:29,750
the way that you make complexity or you

533
00:20:33,520 --> 00:20:31,250
make larger proteins from smaller

534
00:20:36,070 --> 00:20:33,530
domains is through duplication and

535
00:20:39,250 --> 00:20:36,080
diversification so there are some very

536
00:20:40,600 --> 00:20:39,260
clear examples of this and oxido

537
00:20:43,000 --> 00:20:40,610
reductase you have these seen for

538
00:20:45,340 --> 00:20:43,010
example these multi heme proteins and

539
00:20:47,320 --> 00:20:45,350
geo bacter for example that allow you

540
00:20:50,110 --> 00:20:47,330
that allow electron transfer from

541
00:20:53,260 --> 00:20:50,120
mineral substrates into the into the

542
00:20:55,600 --> 00:20:53,270
cell or here's a an iron sulphur wire

543
00:20:57,250 --> 00:20:55,610
that's made out of multiple ferredoxin

544
00:20:59,820 --> 00:20:57,260
x' that are connected together we see

545
00:21:02,200 --> 00:20:59,830
similar things for plastocyanin x' for

546
00:21:03,580 --> 00:21:02,210
ferritin and so forth but i think what's

547
00:21:05,710 --> 00:21:03,590
really interesting here is that it's not

548
00:21:07,240 --> 00:21:05,720
just sort of these very clear examples

549
00:21:10,140 --> 00:21:07,250
where you have these multi cofactor

550
00:21:12,310 --> 00:21:10,150
chains but nearly every module has

551
00:21:14,530 --> 00:21:12,320
examples of these sort of duplications

552
00:21:16,540 --> 00:21:14,540
so clearly you know an important rule

553
00:21:18,220 --> 00:21:16,550
for how do you build complexity is to

554
00:21:21,210 --> 00:21:18,230
just copy something and connect it to a

555
00:21:23,400 --> 00:21:21,220
domain of the same kind so that that is

556
00:21:25,870 --> 00:21:23,410
that's one rule that came out of this

557
00:21:27,910 --> 00:21:25,880
but one of the other things that we

558
00:21:30,100 --> 00:21:27,920
found very interesting and it jumps out

559
00:21:32,500 --> 00:21:30,110
at you if you color the nodes by the

560
00:21:35,260 --> 00:21:32,510
types of cofactors that they bind is

561
00:21:37,960 --> 00:21:35,270
that all of the COFA all of the

562
00:21:39,630 --> 00:21:37,970
cofactors of the same type are connected

563
00:21:43,810 --> 00:21:39,640
to each other so for example here yellow

564
00:21:46,180 --> 00:21:43,820
represents iron sulfur cofactors and

565
00:21:48,280 --> 00:21:46,190
this is not just for iron poor sulfur

566
00:21:49,810 --> 00:21:48,290
this is to our and - sulfur or something

567
00:21:51,730 --> 00:21:49,820
like rubber dachshund where you have a

568
00:21:53,860 --> 00:21:51,740
single iron and poor cystines

569
00:21:56,500 --> 00:21:53,870
coordinating it so all of these are

570
00:21:57,880 --> 00:21:56,510
connected to each other and a connection

571
00:21:59,650 --> 00:21:57,890
here remember does not mean structural

572
00:22:02,830 --> 00:21:59,660
similarity so we're not saying that a

573
00:22:04,780 --> 00:22:02,840
risky type iron sulfur cluster site

574
00:22:06,820 --> 00:22:04,790
looks a lot like a four iron four sulfur

575
00:22:08,200 --> 00:22:06,830
from bacterial ferredoxin they're

576
00:22:09,730 --> 00:22:08,210
structurally very distinct from each

577
00:22:11,230 --> 00:22:09,740
other but what we're saying is that

578
00:22:11,620 --> 00:22:11,240
they're often found connected to each

579
00:22:13,840 --> 00:22:11,630
other

580
00:22:17,620 --> 00:22:13,850
the same thing is true for these four

581
00:22:19,690 --> 00:22:17,630
helix bundle type single iron sites that

582
00:22:23,170 --> 00:22:19,700
are connected to a lot of other mono

583
00:22:24,490 --> 00:22:23,180
metal binding sites same thing are true

584
00:22:27,850 --> 00:22:24,500
for heme binding sites same thing

585
00:22:29,170 --> 00:22:27,860
true for copper binding sites so this is

586
00:22:31,570 --> 00:22:29,180
actually very interesting why are we

587
00:22:34,930 --> 00:22:31,580
getting this metal segregation within

588
00:22:36,280 --> 00:22:34,940
this this this graph and so there's

589
00:22:39,460 --> 00:22:36,290
there's a couple of explanations for

590
00:22:40,420 --> 00:22:39,470
this so one would be that what we're

591
00:22:42,090 --> 00:22:40,430
seeing here because these are

592
00:22:45,190 --> 00:22:42,100
essentially we're arguing that these are

593
00:22:47,500 --> 00:22:45,200
electron transport pathways so one

594
00:22:51,430 --> 00:22:47,510
argument would be that all iron sulfur

595
00:22:53,290 --> 00:22:51,440
sites have similar redox potentials and

596
00:22:54,970 --> 00:22:53,300
so the fact that you have all of these

597
00:22:56,280 --> 00:22:54,980
these iron software sites connected to

598
00:22:58,420 --> 00:22:56,290
each other is essentially just a

599
00:23:00,100 --> 00:22:58,430
thermodynamic phenomenon that this

600
00:23:04,300 --> 00:23:00,110
allows that you don't have any high high

601
00:23:06,280 --> 00:23:04,310
barriers from transfer from one from one

602
00:23:07,930 --> 00:23:06,290
site to the next but we know from

603
00:23:10,630 --> 00:23:07,940
protein engineering studies that you can

604
00:23:13,450 --> 00:23:10,640
have the same metal site and just make

605
00:23:15,370 --> 00:23:13,460
single amino acid changes around in the

606
00:23:18,310 --> 00:23:15,380
second shell around metal site and you

607
00:23:20,410 --> 00:23:18,320
can move the redox potential by over a

608
00:23:22,480 --> 00:23:20,420
volt so you can for example with iron

609
00:23:25,060 --> 00:23:22,490
sulfur sites or with heme sites have a

610
00:23:27,550 --> 00:23:25,070
huge tuning potential without changing

611
00:23:29,470 --> 00:23:27,560
the metal type so that is it's a

612
00:23:30,760 --> 00:23:29,480
possible explanation but it's not

613
00:23:33,490 --> 00:23:30,770
necessarily a very convincing one

614
00:23:35,710 --> 00:23:33,500
another one to think about is protein

615
00:23:37,540 --> 00:23:35,720
biosynthesis so if you're making a

616
00:23:41,170 --> 00:23:37,550
protein that contains multiple cofactors

617
00:23:43,000 --> 00:23:41,180
it might be easier in terms of assembly

618
00:23:45,030 --> 00:23:43,010
to have all the Medeco factors be the

619
00:23:47,320 --> 00:23:45,040
same and then you can provide multiple

620
00:23:49,860 --> 00:23:47,330
iron sulfur clusters to a single protein

621
00:23:51,730 --> 00:23:49,870
or multiple teams to a single protein

622
00:23:53,980 --> 00:23:51,740
but we know that there are many examples

623
00:23:56,410 --> 00:23:53,990
of oxidoreductases that have multiple

624
00:23:58,570 --> 00:23:56,420
different cofactor types in them so

625
00:23:59,920 --> 00:23:58,580
those are two explanations but another

626
00:24:01,950 --> 00:23:59,930
which we think is particularly

627
00:24:05,380 --> 00:24:01,960
tantalizing one that we're now exploring

628
00:24:07,720 --> 00:24:05,390
experimentally within the lab is perhaps

629
00:24:10,840 --> 00:24:07,730
what this is suggesting is that these

630
00:24:13,450 --> 00:24:10,850
evolutionary some physical connections

631
00:24:15,220 --> 00:24:13,460
between modules represent duplication

632
00:24:17,500 --> 00:24:15,230
and then significant diversification

633
00:24:20,830 --> 00:24:17,510
that you have you know the iron sulfur

634
00:24:22,810 --> 00:24:20,840
site constraining the the first shell

635
00:24:25,570 --> 00:24:22,820
ligands but then beyond that you get

636
00:24:27,640 --> 00:24:25,580
significant diversification of the

637
00:24:29,500 --> 00:24:27,650
second shell and a microenvironment on

638
00:24:32,440 --> 00:24:29,510
the protein so in other words that these

639
00:24:33,820 --> 00:24:32,450
these these connections in space may

640
00:24:37,960 --> 00:24:33,830
actually represent evolutionary

641
00:24:38,350 --> 00:24:37,970
connections and this is you know I would

642
00:24:39,460 --> 00:24:38,360
say that

643
00:24:41,530 --> 00:24:39,470
this is something that we still haven't

644
00:24:42,940 --> 00:24:41,540
proven but it's the way I like to think

645
00:24:44,980 --> 00:24:42,950
about this is you know we have this

646
00:24:46,930 --> 00:24:44,990
expression in English that if you're

647
00:24:48,340 --> 00:24:46,940
comparing apples and oranges you're

648
00:24:49,990 --> 00:24:48,350
talking about two very different things

649
00:24:50,980 --> 00:24:50,000
right they're both fruits but they're

650
00:24:53,500 --> 00:24:50,990
very different fruits from each other

651
00:24:55,570 --> 00:24:53,510
one is citrus the other is not but what

652
00:24:57,910 --> 00:24:55,580
if you were to walk out and you find a

653
00:24:58,660 --> 00:24:57,920
tree that had both apples and oranges on

654
00:25:00,400 --> 00:24:58,670
the same tree

655
00:25:01,900 --> 00:25:00,410
now you'd search say well maybe this

656
00:25:03,820 --> 00:25:01,910
expert that expression doesn't make so

657
00:25:05,800 --> 00:25:03,830
much sense maybe apples and oranges are

658
00:25:07,630 --> 00:25:05,810
a lot similar more similar than we

659
00:25:11,110 --> 00:25:07,640
thought and what we're thinking we might

660
00:25:12,250 --> 00:25:11,120
be seeing here and that span is what we

661
00:25:14,500 --> 00:25:12,260
originally thought to be apples and

662
00:25:16,930 --> 00:25:14,510
oranges occurring on the same tree so if

663
00:25:20,440 --> 00:25:16,940
that's the case then what we have now is

664
00:25:22,120 --> 00:25:20,450
a tool for discriminating analogy

665
00:25:25,980 --> 00:25:22,130
etymology so if we go back for example

666
00:25:28,510 --> 00:25:25,990
to the the heme binding cytochrome Seema

667
00:25:30,940 --> 00:25:28,520
module so this is a module that had

668
00:25:33,010 --> 00:25:30,950
about a thousand different members and

669
00:25:36,940 --> 00:25:33,020
let's take this single module now and we

670
00:25:38,610 --> 00:25:36,950
cluster it using a loo vein clustering

671
00:25:41,500 --> 00:25:38,620
method it's just a one way of sort of

672
00:25:44,890 --> 00:25:41,510
classifying sub sub graphs within a

673
00:25:46,510 --> 00:25:44,900
larger graph let's say we classify this

674
00:25:49,510 --> 00:25:46,520
into like eight smaller segments and

675
00:25:51,550 --> 00:25:49,520
this is these connections here are based

676
00:25:54,160 --> 00:25:51,560
on structural similarity and then we

677
00:25:56,230 --> 00:25:54,170
take that and we now generate a span for

678
00:25:58,570 --> 00:25:56,240
that so now we say within those sub

679
00:26:00,280 --> 00:25:58,580
graphs which ones are found spatially

680
00:26:02,230 --> 00:26:00,290
next to each other within the same same

681
00:26:04,800 --> 00:26:02,240
protein and what we find is actually

682
00:26:07,510 --> 00:26:04,810
within this larger module there are

683
00:26:09,520 --> 00:26:07,520
subclasses so that one in two are often

684
00:26:10,690 --> 00:26:09,530
found connected to each other but we

685
00:26:13,540 --> 00:26:10,700
never see connections between one and

686
00:26:15,490 --> 00:26:13,550
two in any of the other cytochrome C

687
00:26:16,980 --> 00:26:15,500
type modules between three and eight so

688
00:26:19,660 --> 00:26:16,990
maybe there actually are two

689
00:26:22,180 --> 00:26:19,670
evolutionarily different cytochrome C

690
00:26:23,860 --> 00:26:22,190
type modules that all they share really

691
00:26:25,960 --> 00:26:23,870
is just this chemical similarity that

692
00:26:28,780 --> 00:26:25,970
they bind means but they have otherwise

693
00:26:30,160 --> 00:26:28,790
independent evolutionary origins so in

694
00:26:32,710 --> 00:26:30,170
other words what we're looking at here

695
00:26:34,600 --> 00:26:32,720
is convergent evolution of one two class

696
00:26:42,570 --> 00:26:34,610
and the three through a class but then

697
00:26:46,330 --> 00:26:45,040
so then that suggests if we look at the

698
00:26:48,940 --> 00:26:46,340
go back and look at the network with

699
00:26:51,340 --> 00:26:48,950
this mind set that perhaps we have four

700
00:26:52,040 --> 00:26:51,350
or five or six fundamental modules maybe

701
00:26:53,600 --> 00:26:52,050
the ferry dock

702
00:26:55,490 --> 00:26:53,610
and the ferredoxin obvious is great

703
00:26:57,950 --> 00:26:55,500
analyzing when this may have been one of

704
00:27:00,640 --> 00:26:57,960
the first iron-sulfur module then

705
00:27:03,200 --> 00:27:00,650
diversified into a number of other

706
00:27:05,780 --> 00:27:03,210
module types the same thing with maybe

707
00:27:10,400 --> 00:27:05,790
one of these cytochrome c type modules

708
00:27:15,230 --> 00:27:10,410
the four helix bundle as a source and

709
00:27:17,320 --> 00:27:15,240
then the plastocyanin for copper so

710
00:27:21,110 --> 00:27:17,330
we're very excited about exploring now

711
00:27:23,240 --> 00:27:21,120
whether or not we can use these as sort

712
00:27:25,390 --> 00:27:23,250
of archetypes for understanding the

713
00:27:31,760 --> 00:27:25,400
evolution of these original

714
00:27:32,960 --> 00:27:31,770
oxidoreductase modules so what I'd like

715
00:27:35,660 --> 00:27:32,970
to say at this point is that you know we

716
00:27:38,660 --> 00:27:35,670
can take something like a large complex

717
00:27:40,970 --> 00:27:38,670
oxido reductase and decompose it into

718
00:27:42,200 --> 00:27:40,980
smaller modules and we believe that

719
00:27:44,540 --> 00:27:42,210
these modules are behaving like

720
00:27:45,830 --> 00:27:44,550
evolutionarily selectable domains

721
00:27:47,990 --> 00:27:45,840
they're functionally discrete they're

722
00:27:50,360 --> 00:27:48,000
selectable and we believe that this

723
00:27:53,270 --> 00:27:50,370
complexity likely evolved from the

724
00:27:57,290 --> 00:27:53,280
assembly of these smaller modules into

725
00:27:59,120 --> 00:27:57,300
these much larger complexes and very

726
00:28:01,400 --> 00:27:59,130
simply through domain duplication to

727
00:28:03,170 --> 00:28:01,410
build wires but then also maybe through

728
00:28:05,870 --> 00:28:03,180
diversification to develop these more

729
00:28:07,580 --> 00:28:05,880
functionally specialized branches of

730
00:28:10,400 --> 00:28:07,590
these electron transport pathways and

731
00:28:12,680 --> 00:28:10,410
perhaps we can start exploring these

732
00:28:15,230 --> 00:28:12,690
these spatial adjacency relationships as

733
00:28:17,390 --> 00:28:15,240
a construction to look much more deeply

734
00:28:18,950 --> 00:28:17,400
into the fossil history of proteins

735
00:28:21,200 --> 00:28:18,960
where we're not depending on the

736
00:28:23,030 --> 00:28:21,210
vagaries of structural alignments which

737
00:28:26,150 --> 00:28:23,040
are themselves more sensitive to deep

738
00:28:27,560 --> 00:28:26,160
time than sequence alignments for for

739
00:28:29,840 --> 00:28:27,570
making these relationships but also

740
00:28:33,340 --> 00:28:29,850
using this as another way to establish

741
00:28:38,600 --> 00:28:33,350
connections between structural domains

742
00:28:40,550 --> 00:28:38,610
and as with our our work on Homo

743
00:28:43,340 --> 00:28:40,560
chirality and our ability to relate that

744
00:28:45,080 --> 00:28:43,350
to the design of therapeutic peptides we

745
00:28:47,570 --> 00:28:45,090
also see in this case evolution in

746
00:28:49,340 --> 00:28:47,580
design as two sides of the same coin so

747
00:28:50,900 --> 00:28:49,350
while we are studying the evolutionary

748
00:28:52,940 --> 00:28:50,910
relationships between these modules

749
00:28:54,740 --> 00:28:52,950
we're also now thinking about ways of

750
00:28:57,680 --> 00:28:54,750
hooking these things up together to

751
00:29:00,170 --> 00:28:57,690
start to make nanoscale devices for for

752
00:29:01,460 --> 00:29:00,180
by electronics and Eric you may look at

753
00:29:03,110 --> 00:29:01,470
this and you may see a bifurcating

754
00:29:04,970 --> 00:29:03,120
pathway right here we're actually very

755
00:29:05,990 --> 00:29:04,980
interested in going back to the span and

756
00:29:07,100 --> 00:29:06,000
think about

757
00:29:08,480 --> 00:29:07,110
in addition to just pairwise

758
00:29:11,450 --> 00:29:08,490
interactions maybe multi-body

759
00:29:13,670 --> 00:29:11,460
interactions between electron between

760
00:29:17,030 --> 00:29:13,680
these metal sites is to look at the

761
00:29:23,750 --> 00:29:17,040
emergence of more complex topologies and

762
00:29:26,690 --> 00:29:23,760
electron transport and so perhaps

763
00:29:28,910 --> 00:29:26,700
instead of having a single ancestor that

764
00:29:31,070 --> 00:29:28,920
has led to you know the emergence of all

765
00:29:33,260 --> 00:29:31,080
these different oxidoreductases

766
00:29:36,380 --> 00:29:33,270
maybe we had several luca's or maybe you

767
00:29:38,000 --> 00:29:36,390
want to call it Lukas that themselves

768
00:29:41,990 --> 00:29:38,010
assembled in different ways to make

769
00:29:43,070 --> 00:29:42,000
these modern extant nanomachines and so

770
00:29:45,500 --> 00:29:43,080
what we're doing now is we're starting

771
00:29:47,480 --> 00:29:45,510
to ask how can we walk from these very

772
00:29:49,670 --> 00:29:47,490
simple domains which themselves are

773
00:29:51,440 --> 00:29:49,680
fairly functionally naive to perhaps

774
00:29:53,060 --> 00:29:51,450
complexes where you have two or three

775
00:29:54,350 --> 00:29:53,070
domains that can do more interesting

776
00:29:56,900 --> 00:29:54,360
things we want to move in this direction

777
00:29:58,400 --> 00:29:56,910
to words the complex machines and see

778
00:30:00,620 --> 00:29:58,410
what are the minimal assemblies that

779
00:30:03,650 --> 00:30:00,630
give us the the functional the catalytic

780
00:30:04,820 --> 00:30:03,660
properties that we want but also what's

781
00:30:07,460 --> 00:30:04,830
interesting now is we already have

782
00:30:08,780 --> 00:30:07,470
fairly simple archetypes of what these

783
00:30:11,000 --> 00:30:08,790
original modules may have looked like

784
00:30:12,830 --> 00:30:11,010
can we start to walk them back even

785
00:30:14,930 --> 00:30:12,840
further can we go backwards in evolution

786
00:30:16,490 --> 00:30:14,940
paths before the common ancestor and

787
00:30:19,100 --> 00:30:16,500
start thinking about what these

788
00:30:21,470 --> 00:30:19,110
prebiotic peptide and/or peptides may

789
00:30:22,790 --> 00:30:21,480
have looked like and of course this is

790
00:30:24,620 --> 00:30:22,800
something that we heard a little bit

791
00:30:26,690 --> 00:30:24,630
about yesterday thinking about how

792
00:30:27,890 --> 00:30:26,700
peptides may have interacted with

793
00:30:31,400 --> 00:30:27,900
minerals themselves that are already

794
00:30:33,380 --> 00:30:31,410
capable of redox catalysis how baptized

795
00:30:36,020 --> 00:30:33,390
may have aided this and I'll just show

796
00:30:38,590 --> 00:30:36,030
one slide that shows one of our forays

797
00:30:40,820 --> 00:30:38,600
into this area and we've been looking at

798
00:30:43,280 --> 00:30:40,830
bacterial ferredoxin for a long time

799
00:30:45,800 --> 00:30:43,290
it's a protein that binds to for R and

800
00:30:48,890 --> 00:30:45,810
for sulfur clusters it's about 60 amino

801
00:30:52,280 --> 00:30:48,900
acids it itself is clearly a domain

802
00:30:54,500 --> 00:30:52,290
duplication of - 20 to 30 amino acid

803
00:30:56,390 --> 00:30:54,510
domains and what we've done is by

804
00:30:58,370 --> 00:30:56,400
looking at the the reach of the business

805
00:31:00,710 --> 00:30:58,380
end of this molecule that is responsible

806
00:31:03,410 --> 00:31:00,720
for binding the Orang sulfur cluster

807
00:31:05,660 --> 00:31:03,420
we've been able to reduce this 60 amino

808
00:31:07,670 --> 00:31:05,670
acid protein about fivefold

809
00:31:10,190 --> 00:31:07,680
to this small cyclic peptide which is

810
00:31:11,720 --> 00:31:10,200
only 12 amino acids and this 12 amino

811
00:31:14,150 --> 00:31:11,730
acid peptide is able to stay believed

812
00:31:16,370 --> 00:31:14,160
bind a 4-iron for self per cluster this

813
00:31:17,260 --> 00:31:16,380
is an EPR spectrum showing that it has

814
00:31:22,510 --> 00:31:17,270
the

815
00:31:23,800 --> 00:31:22,520
salt for protein but will be find

816
00:31:26,620 --> 00:31:23,810
particularly exciting about this

817
00:31:29,440 --> 00:31:26,630
particular design is that if you notice

818
00:31:31,600 --> 00:31:29,450
here if you look at the the topology of

819
00:31:33,550 --> 00:31:31,610
this protein all of the backbone amides

820
00:31:35,680 --> 00:31:33,560
which are in blue are pointing in

821
00:31:39,370 --> 00:31:35,690
towards the the iron sulfur cluster and

822
00:31:41,260 --> 00:31:39,380
this is something that a structural

823
00:31:43,660 --> 00:31:41,270
biologists Miller and white would call a

824
00:31:45,040 --> 00:31:43,670
cationic nest so essentially all these

825
00:31:47,380 --> 00:31:45,050
back ammonium eyes are creating a nice

826
00:31:49,330 --> 00:31:47,390
stable binding site for an iron sulfur

827
00:31:50,890 --> 00:31:49,340
cluster and so what happens now is that

828
00:31:53,140 --> 00:31:50,900
because you have the stable binding site

829
00:31:55,120 --> 00:31:53,150
we're able to take this complex and

830
00:31:57,250 --> 00:31:55,130
oxidize and reduce it thousands of times

831
00:31:59,530 --> 00:31:57,260
and it doesn't fall apart so this thing

832
00:32:02,380 --> 00:31:59,540
has a redox potential close to that a

833
00:32:05,560 --> 00:32:02,390
ferredoxin but it's very stable and I've

834
00:32:08,500 --> 00:32:05,570
designed several iron sulfur proteins in

835
00:32:09,640 --> 00:32:08,510
my career as a protein designer and the

836
00:32:11,530 --> 00:32:09,650
best that we've been able to do before

837
00:32:13,510 --> 00:32:11,540
this was about 16 cycles before the

838
00:32:14,830 --> 00:32:13,520
thing falls apart and in fact the most

839
00:32:16,810 --> 00:32:14,840
recent design before this fell apart

840
00:32:18,970 --> 00:32:16,820
after one cycle so something that has

841
00:32:20,230 --> 00:32:18,980
this extent of stability is is

842
00:32:23,500 --> 00:32:20,240
unprecedented so we're very excited

843
00:32:25,900 --> 00:32:23,510
about this and so designs like this that

844
00:32:28,000 --> 00:32:25,910
are inspired by these these small

845
00:32:30,400 --> 00:32:28,010
domains may be a way for us to

846
00:32:33,370 --> 00:32:30,410
extrapolate back to what prebiotic

847
00:32:36,310 --> 00:32:33,380
peptides may look like so I'll end there

848
00:32:37,810 --> 00:32:36,320
and thanks again for the invitation and

849
00:32:46,300 --> 00:32:37,820
the chance to speak and I welcome any

850
00:32:47,590 --> 00:32:46,310
questions wonderful talk thank you and

851
00:32:51,160 --> 00:32:47,600
we'll take the first question from

852
00:32:52,080 --> 00:32:51,170
George that was really fabulous I made

853
00:32:54,700 --> 00:32:52,090
it

854
00:32:56,830 --> 00:32:54,710
screaming at classy a couple of quick

855
00:32:58,840 --> 00:32:56,840
questions in these electron transfer

856
00:33:00,670 --> 00:32:58,850
chains are you seeing hopping are you

857
00:33:03,280 --> 00:33:00,680
seeing drift are you seeing tunneling

858
00:33:05,920 --> 00:33:03,290
what is the mechanism so we're agnostic

859
00:33:07,210 --> 00:33:05,930
to the mechanism we don't know if we're

860
00:33:09,130 --> 00:33:07,220
seeing tunneling or if we're seeing

861
00:33:14,200 --> 00:33:09,140
hopping right I mean that you're looking

862
00:33:17,050 --> 00:33:14,210
essentially at connections between metal

863
00:33:22,060 --> 00:33:17,060
clusters that are within the within the

864
00:33:24,220 --> 00:33:22,070
context of the protein matrix so if

865
00:33:27,160 --> 00:33:24,230
you're hairy grey then you would you

866
00:33:28,990 --> 00:33:27,170
would be looking for essentially hopping

867
00:33:30,400 --> 00:33:29,000
intermediates between these these these

868
00:33:32,110 --> 00:33:30,410
metal clusters so aromatics

869
00:33:33,100 --> 00:33:32,120
example and so one of the things that

870
00:33:34,930 --> 00:33:33,110
were interested in looking at now that

871
00:33:36,970 --> 00:33:34,940
we sort of identified what these

872
00:33:39,520 --> 00:33:36,980
electron transport pathways are see

873
00:33:40,840 --> 00:33:39,530
whether we see amino acids between them

874
00:33:43,000 --> 00:33:40,850
that may be affecting the conductivity

875
00:33:45,760 --> 00:33:43,010
the beta of the environment that would

876
00:33:47,560 --> 00:33:45,770
help us delineate the mechanism okay and

877
00:33:49,930 --> 00:33:47,570
then one other questions and many I

878
00:33:51,490 --> 00:33:49,940
could ask if you look at the genomic

879
00:33:54,070 --> 00:33:51,500
structure of these systems that have

880
00:33:56,590 --> 00:33:54,080
these hypothesized multiple domain

881
00:33:59,470 --> 00:33:56,600
replications are they contiguous are

882
00:34:01,690 --> 00:33:59,480
they intron exon mixtures how do you

883
00:34:03,400 --> 00:34:01,700
actually get these piled together in the

884
00:34:06,490 --> 00:34:03,410
genome in such a fashion that you end up

885
00:34:08,650 --> 00:34:06,500
with the collection of contiguous amino

886
00:34:10,060 --> 00:34:08,660
acids as you see so we simply haven't

887
00:34:11,620 --> 00:34:10,070
done that but I think that's that's an

888
00:34:13,750 --> 00:34:11,630
important next step is to start thinking

889
00:34:16,419 --> 00:34:13,760
because sequence evolution doesn't

890
00:34:18,970 --> 00:34:16,429
happen in structure you don't blong

891
00:34:21,400 --> 00:34:18,980
units together it has to happen at the

892
00:34:22,930 --> 00:34:21,410
level of sequence and previously we

893
00:34:25,090 --> 00:34:22,940
tried to look at this problem using

894
00:34:27,280 --> 00:34:25,100
sequence analysis alone trying to

895
00:34:28,810 --> 00:34:27,290
extrapolate from one type of metal

896
00:34:32,020 --> 00:34:28,820
binding site to another through sequence

897
00:34:33,940 --> 00:34:32,030
intermediates and I think that combining

898
00:34:35,110 --> 00:34:33,950
that analysis with the structural

899
00:34:35,500 --> 00:34:35,120
analysis would be a way to get your

900
00:34:41,800 --> 00:34:35,510
question

901
00:34:43,480 --> 00:34:41,810
definitely hey so great talk for the you

902
00:34:45,330 --> 00:34:43,490
know acid sequences that are involved in

903
00:34:48,190 --> 00:34:45,340
these highly stable kind of small

904
00:34:49,750 --> 00:34:48,200
systems that stabilize these yes yeah

905
00:34:52,000 --> 00:34:49,760
yeah is there any relationship between

906
00:34:53,650 --> 00:34:52,010
the amino acids present in those and the

907
00:34:55,810 --> 00:34:53,660
biosynthetic pathways by which those

908
00:34:58,330 --> 00:34:55,820
amino acids are generated meaning are

909
00:34:59,950 --> 00:34:58,340
they kind of initially early amino acids

910
00:35:06,640 --> 00:34:59,960
or these something's that came along

911
00:35:08,770 --> 00:35:06,650
quite a bit later so I don't know so I

912
00:35:10,840 --> 00:35:08,780
don't I don't know if there is if we can

913
00:35:13,480 --> 00:35:10,850
only make these things with simple amino

914
00:35:15,640 --> 00:35:13,490
acids what I'll say is that the sequence

915
00:35:17,340 --> 00:35:15,650
pattern that we're using here is very

916
00:35:21,130 --> 00:35:17,350
similar to the one that was proposed by

917
00:35:23,530 --> 00:35:21,140
day - and Vanek which is that that small

918
00:35:25,030 --> 00:35:23,540
four amino acid repeat and critical to

919
00:35:27,580 --> 00:35:25,040
that is having a cysteine obviously

920
00:35:30,520 --> 00:35:27,590
binding the the cluster within the

921
00:35:37,170 --> 00:35:30,530
flanking amino acids amino acids like

922
00:35:40,500 --> 00:35:37,180
lysine and glycine work very well yeah

923
00:35:42,410 --> 00:35:40,510
whoever has the Chumash the magic cube

924
00:35:45,289 --> 00:35:42,420
yeah

925
00:35:49,370 --> 00:35:45,299
hopefully quick stunning like everybody

926
00:35:53,329 --> 00:35:49,380
else says it's a question about the way

927
00:35:56,030 --> 00:35:53,339
you use this this paradigm for

928
00:35:58,190 --> 00:35:56,040
evolutionary interpretation it looks

929
00:36:00,020 --> 00:35:58,200
like this natural modularization gives

930
00:36:03,849 --> 00:36:00,030
you a kind of typology of functional

931
00:36:06,500 --> 00:36:03,859
States and if I think about an

932
00:36:09,500 --> 00:36:06,510
evolutionary model I want a model of

933
00:36:11,510 --> 00:36:09,510
states and transitions if I think about

934
00:36:14,270 --> 00:36:11,520
what people often do in trying to

935
00:36:16,490 --> 00:36:14,280
recover old protein folds or old protein

936
00:36:17,990 --> 00:36:16,500
fold fragments they look at the

937
00:36:20,660 --> 00:36:18,000
recruitment of a thing that's

938
00:36:23,990 --> 00:36:20,670
effectively a unit that can move that

939
00:36:27,740 --> 00:36:24,000
can mutate that can do whatever is it

940
00:36:30,829 --> 00:36:27,750
possible to think about looking at the

941
00:36:34,309 --> 00:36:30,839
the repurposing of existing structures

942
00:36:37,430 --> 00:36:34,319
with minimal changes in the way fold

943
00:36:39,710 --> 00:36:37,440
reconstruction people often do and then

944
00:36:42,349 --> 00:36:39,720
looking at these as sort of attractors

945
00:36:45,190 --> 00:36:42,359
to the viable states that tell you where

946
00:36:48,170 --> 00:36:45,200
a spandrel can be anchored to make a

947
00:36:50,329 --> 00:36:48,180
kind of comprehensive evolutionary

948
00:36:52,549 --> 00:36:50,339
reconstruction that is both what you do

949
00:36:58,660 --> 00:36:52,559
and also makes contact with the strong

950
00:37:03,079 --> 00:37:01,370
we need to do some groundwork here right

951
00:37:06,020 --> 00:37:03,089
so what we're looking at right now these

952
00:37:08,120 --> 00:37:06,030
are structural modules whether they are

953
00:37:11,809 --> 00:37:08,130
functional modules or not are not

954
00:37:13,940 --> 00:37:11,819
necessarily discrete pieces of sequence

955
00:37:16,000 --> 00:37:13,950
and we need to get to that state in

956
00:37:18,109 --> 00:37:16,010
order to do things like ancestral

957
00:37:21,620 --> 00:37:18,119
reconstruction methods to see if we can

958
00:37:23,120 --> 00:37:21,630
figure out what our what are the what

959
00:37:26,150 --> 00:37:23,130
are the intermediates between two types

960
00:37:32,059 --> 00:37:26,160
of two types of modules on the span

961
00:37:35,059 --> 00:37:32,069
would be and I'm I'm very much inspired

962
00:37:36,230 --> 00:37:35,069
by the work of Brian and/or bond at the

963
00:37:37,970 --> 00:37:36,240
University of Maryland I don't know if

964
00:37:41,660 --> 00:37:37,980
you've seen some of this work where they

965
00:37:45,200 --> 00:37:41,670
essentially go from three helix protein

966
00:37:46,849 --> 00:37:45,210
that binds to BSA to a one helix three

967
00:37:48,890 --> 00:37:46,859
beta sheet protein that binds to an

968
00:37:50,390 --> 00:37:48,900
immunoglobulin and what they're able to

969
00:37:52,849 --> 00:37:50,400
do is through a series of single amino

970
00:37:55,220 --> 00:37:52,859
acid mutations keeping the binding sites

971
00:37:55,880 --> 00:37:55,230
for both of those domains intact walky

972
00:37:58,400 --> 00:37:55,890
from a protein

973
00:37:59,750 --> 00:37:58,410
has one structure to important that is

974
00:38:01,609 --> 00:37:59,760
another structure and at the very center

975
00:38:03,799 --> 00:38:01,619
of that with a single amino acid

976
00:38:05,569 --> 00:38:03,809
mutation you can go to one structure or

977
00:38:07,250 --> 00:38:05,579
you can go to the other right and I

978
00:38:10,250 --> 00:38:07,260
think that what we're is what we're

979
00:38:12,589 --> 00:38:10,260
saying here with spatial connection

980
00:38:15,140 --> 00:38:12,599
being an evolutionary connection that is

981
00:38:17,000 --> 00:38:15,150
I think at this point still a hypothesis

982
00:38:18,500 --> 00:38:17,010
and the only way for us to really see

983
00:38:20,509 --> 00:38:18,510
whether that's plausible is to go into

984
00:38:21,650 --> 00:38:20,519
the laboratory and try and design some

985
00:38:23,660 --> 00:38:21,660
of these pathways and see whether

986
00:38:25,849 --> 00:38:23,670
they're plausible so the the way that I

987
00:38:27,140 --> 00:38:25,859
approach that as a protein engineer will

988
00:38:29,329 --> 00:38:27,150
be to try and actually engineer some of

989
00:38:31,099 --> 00:38:29,339
these transition fossils and see whether

990
00:38:37,579 --> 00:38:31,109
we can make them behave the way that we

991
00:38:40,339 --> 00:38:37,589
would expect them to oh it was a really

992
00:38:42,859 --> 00:38:40,349
intriguing talk thank you so much so in

993
00:38:46,039 --> 00:38:42,869
terms of like the transition from this

994
00:38:48,769 --> 00:38:46,049
prebiotic to the biotic function one of

995
00:38:51,380 --> 00:38:48,779
the the key question is just is the

996
00:38:53,870 --> 00:38:51,390
actual maintenance and the evolution of

997
00:38:57,079 --> 00:38:53,880
the functionality of this polypeptide

998
00:39:00,109 --> 00:38:57,089
that Co associated with the metal and I

999
00:39:03,470 --> 00:39:00,119
was wondering so this type of mall this

1000
00:39:06,499 --> 00:39:03,480
type of minimal module could form in

1001
00:39:08,990 --> 00:39:06,509
prebiotic era however we all know that

1002
00:39:10,940 --> 00:39:09,000
the protein can be only being replicated

1003
00:39:12,740 --> 00:39:10,950
through this genetic coding and that's

1004
00:39:15,740 --> 00:39:12,750
always been a problematic but I was

1005
00:39:19,700 --> 00:39:15,750
wondering it seems like this minimal

1006
00:39:21,740 --> 00:39:19,710
module have almost minimal sequence

1007
00:39:23,390 --> 00:39:21,750
specificity meaning that doesn't

1008
00:39:27,289 --> 00:39:23,400
necessarily need to be this specific

1009
00:39:30,289 --> 00:39:27,299
sequence in an inner primary mode so do

1010
00:39:33,410 --> 00:39:30,299
you have you ever looked into like this

1011
00:39:37,039 --> 00:39:33,420
the phase space this the functional

1012
00:39:38,690 --> 00:39:37,049
landscape of this type of module and if

1013
00:39:42,799 --> 00:39:38,700
that landscape is big enough to

1014
00:39:45,049 --> 00:39:42,809
basically cover a wide range of

1015
00:39:47,180 --> 00:39:45,059
different combination of amino acids

1016
00:39:50,180 --> 00:39:47,190
that can actually do this redox cycle

1017
00:39:53,599 --> 00:39:50,190
then do you think that will leverage the

1018
00:39:57,470 --> 00:39:53,609
the era catastrophe that was thought to

1019
00:39:59,870 --> 00:39:57,480
be necessary for this genetic code so so

1020
00:40:01,579 --> 00:39:59,880
what you're asking if I understand and

1021
00:40:03,859 --> 00:40:01,589
clarify me is that did we come across

1022
00:40:06,859 --> 00:40:03,869
the 112 amino acid sequence that works

1023
00:40:10,099 --> 00:40:06,869
or is this it just just sort of one a

1024
00:40:12,569 --> 00:40:10,109
very evolvable sequence and the

1025
00:40:14,910 --> 00:40:12,579
the answer is that we've only tried two

1026
00:40:16,410 --> 00:40:14,920
or three sequences right and we have one

1027
00:40:20,640 --> 00:40:16,420
that doesn't work and we have two that

1028
00:40:22,710 --> 00:40:20,650
do but those are they are not a very

1029
00:40:23,760 --> 00:40:22,720
good if we were at to answer the

1030
00:40:26,130 --> 00:40:23,770
question that you're asking

1031
00:40:27,960 --> 00:40:26,140
we wouldn't design the sequences we

1032
00:40:30,450 --> 00:40:27,970
would build libraries you know and see

1033
00:40:31,500 --> 00:40:30,460
what is the success rate for that and I

1034
00:40:34,109 --> 00:40:31,510
think that's a very good thing to try

1035
00:40:36,569 --> 00:40:34,119
definitely especially like given the is

1036
00:40:38,540 --> 00:40:36,579
it's the backbone it seems to be the key

1037
00:40:41,490 --> 00:40:38,550
which doesn't require the sidechain

1038
00:40:43,530 --> 00:40:41,500
might this could it kind of imply that

1039
00:40:45,180 --> 00:40:43,540
that's right I mean the all of the

1040
00:40:46,410 --> 00:40:45,190
interactions here are either system.the

1041
00:40:48,120 --> 00:40:46,420
cysteines have to be there the four

1042
00:40:51,329 --> 00:40:48,130
systems and then everything else is

1043
00:40:52,800 --> 00:40:51,339
backbone so in theory there should be

1044
00:40:59,690 --> 00:40:52,810
highly design and will highly evolved

1045
00:41:04,920 --> 00:41:01,740
can't talk to you until you get the cube

1046
00:41:08,280 --> 00:41:04,930
Thanks yeah thanks for the great talk

1047
00:41:10,589 --> 00:41:08,290
I have two questions ones may be easy

1048
00:41:14,190 --> 00:41:10,599
and the other one's a little bit harder

1049
00:41:15,960 --> 00:41:14,200
I think in 2008 Leslie not in this group

1050
00:41:17,370 --> 00:41:15,970
another piece of work out of that group

1051
00:41:18,150 --> 00:41:17,380
they created these things that they were

1052
00:41:20,910 --> 00:41:18,160
calling them maquettes

1053
00:41:22,980 --> 00:41:20,920
and I think they had a 16 amino acids

1054
00:41:24,930 --> 00:41:22,990
and I was just wondering what's the

1055
00:41:25,829 --> 00:41:24,940
commonality or difference between what

1056
00:41:30,950 --> 00:41:25,839
you're showing and what they were

1057
00:41:34,319 --> 00:41:30,960
showing so so that's that sequence is a

1058
00:41:35,670 --> 00:41:34,329
very similar to this if you look closely

1059
00:41:37,980 --> 00:41:35,680
at this you can essentially see that

1060
00:41:40,260 --> 00:41:37,990
there's a cysteine two amino acids in

1061
00:41:42,180 --> 00:41:40,270
another cysteine the difference there is

1062
00:41:44,250 --> 00:41:42,190
that they have three amino acids between

1063
00:41:46,680 --> 00:41:44,260
and their glycine so they're very

1064
00:41:48,720 --> 00:41:46,690
flexible right so there's no there's not

1065
00:41:51,960 --> 00:41:48,730
necessarily a specific conformation for

1066
00:41:53,370 --> 00:41:51,970
those and what I would argue the reason

1067
00:41:55,670 --> 00:41:53,380
why I think this peptide is so well

1068
00:41:58,020 --> 00:41:55,680
behaved in terms of both yield

1069
00:42:01,079 --> 00:41:58,030
specificity for iron sulfur binding and

1070
00:42:03,690 --> 00:42:01,089
then also redox stability is that it

1071
00:42:05,490 --> 00:42:03,700
adopts we've essentially addressed the

1072
00:42:07,530 --> 00:42:05,500
Leventhal paradox for this it has a

1073
00:42:10,290 --> 00:42:07,540
unique conformation in the a post age

1074
00:42:11,940 --> 00:42:10,300
that we believe is in tactic is pre

1075
00:42:16,260 --> 00:42:11,950
organized to bind the iron sulfur

1076
00:42:18,059 --> 00:42:16,270
cluster and so in that sense it's better

1077
00:42:19,230 --> 00:42:18,069
behaved but in a lot of ways it's

1078
00:42:21,660 --> 00:42:19,240
similar right essentially what they did

1079
00:42:23,430 --> 00:42:21,670
also was to go into ferredoxin see what

1080
00:42:24,809 --> 00:42:23,440
was the spacing between cysteines

1081
00:42:27,870 --> 00:42:24,819
and then just make a model peptide on

1082
00:42:29,280 --> 00:42:27,880
maquette that had that same spacing and

1083
00:42:31,230 --> 00:42:29,290
essentially all we've done here is

1084
00:42:32,970 --> 00:42:31,240
rather than look at the sequence we've

1085
00:42:34,079 --> 00:42:32,980
looked at the structure tried to figure

1086
00:42:36,300 --> 00:42:34,089
out what are the key elements of the

1087
00:42:38,400 --> 00:42:36,310
structure that give you that that iron

1088
00:42:40,530 --> 00:42:38,410
sulfur binding can make make a maquette

1089
00:42:41,970 --> 00:42:40,540
that that mimics that okay and then the

1090
00:42:46,440 --> 00:42:41,980
second question that I thought that was

1091
00:42:49,800 --> 00:42:46,450
the easy question yeah so maybe a little

1092
00:42:52,380 --> 00:42:49,810
bit speculative but for these four iron

1093
00:42:54,270 --> 00:42:52,390
first of all four cubes the possibility

1094
00:42:56,309 --> 00:42:54,280
of site differentiation of the cluster

1095
00:42:58,020 --> 00:42:56,319
is really important for like a

1096
00:43:00,290 --> 00:42:58,030
connotates activity like Joseph talked

1097
00:43:03,660 --> 00:43:00,300
about but also the entire radical Sam

1098
00:43:06,329 --> 00:43:03,670
protein family has a similar mode of

1099
00:43:08,640 --> 00:43:06,339
binding for in that case this is the s

1100
00:43:10,500 --> 00:43:08,650
adenosylmethionine so the site

1101
00:43:13,050 --> 00:43:10,510
differentiation having this open iron

1102
00:43:14,700 --> 00:43:13,060
coordination seems important do you

1103
00:43:17,190 --> 00:43:14,710
think it will be possible to make an

1104
00:43:20,430 --> 00:43:17,200
open coordination I are definitely gonna

1105
00:43:22,530 --> 00:43:20,440
try right I mean we're definitely move

1106
00:43:24,930 --> 00:43:22,540
one of the ligands or e replace them

1107
00:43:28,530 --> 00:43:24,940
with the midazolam combinations of

1108
00:43:30,740 --> 00:43:28,540
things so if you know if for example

1109
00:43:33,720 --> 00:43:30,750
having a three iron four sulfur site is

1110
00:43:36,089 --> 00:43:33,730
advantageous or whether we can stabilize

1111
00:43:38,220 --> 00:43:36,099
that that fourth metal with hydroxyl

1112
00:43:40,559 --> 00:43:38,230
that condemn you replace with an active

1113
00:43:51,290 --> 00:43:40,569
site ligand and that would be very

1114
00:43:53,400 --> 00:43:51,300
exciting I think so - yes with a P loop

1115
00:44:03,510 --> 00:43:53,410
well I'm sorry I don't understand what

1116
00:44:05,220 --> 00:44:03,520
wouldn't matter with me yes that's right

1117
00:44:06,390 --> 00:44:05,230
so another have had the glycines in

1118
00:44:08,010 --> 00:44:06,400
there said you could access that

1119
00:44:10,620 --> 00:44:08,020
left-handed conformation right rather

1120
00:44:12,540 --> 00:44:10,630
than using non natural amino acids yes

1121
00:44:13,950 --> 00:44:12,550
so you could say that's right from the

1122
00:44:15,660 --> 00:44:13,960
very beginning you've probably got a P

1123
00:44:17,880 --> 00:44:15,670
loop right at the beginning very early

1124
00:44:20,430 --> 00:44:17,890
in life and all you've got is P loops

1125
00:44:21,870 --> 00:44:20,440
everywhere pubes yeah so and this the

1126
00:44:23,160 --> 00:44:21,880
shares a lot of similarity to P loops I

1127
00:44:25,140 --> 00:44:23,170
mean essentially in a P loop all you

1128
00:44:26,900 --> 00:44:25,150
have is a row of a my it's pointing at

1129
00:44:28,650 --> 00:44:26,910
your your nucleotide so

1130
00:44:30,660 --> 00:44:28,660
electrostatically this behaves a lot

1131
00:44:32,760 --> 00:44:30,670
like the P loop I think what's unique

1132
00:44:35,220 --> 00:44:32,770
about this typology that also presents

1133
00:44:36,020 --> 00:44:35,230
the primary shell ligands in a sidechain

1134
00:44:37,820 --> 00:44:36,030
confirmations

1135
00:44:39,590 --> 00:44:37,830
they can hold a metal core element yes

1136
00:44:41,690 --> 00:44:39,600
so just one question about that then

1137
00:44:44,930 --> 00:44:41,700
well by the way are there glycines in

1138
00:44:46,430 --> 00:44:44,940
this in the 12 amino acid ones no

1139
00:44:48,680 --> 00:44:46,440
there's no glycines in business so it's

1140
00:44:50,090 --> 00:44:48,690
also at alum D amino acids and they're

1141
00:44:52,430 --> 00:44:50,100
already amino acids there Ellen do you

1142
00:44:55,030 --> 00:44:52,440
know ask could you imagine a loop like

1143
00:44:58,160 --> 00:44:55,040
that without cysteine

1144
00:44:59,450 --> 00:44:58,170
could you imagine oh absolutely yeah you

1145
00:45:00,710 --> 00:44:59,460
could but then it and all the

1146
00:45:01,610 --> 00:45:00,720
interactions would be electrostatic and

1147
00:45:04,310 --> 00:45:01,620
so maybe then that could bind a

1148
00:45:07,070 --> 00:45:04,320
phosphate much like much like these

1149
00:45:09,110 --> 00:45:07,080
cationic nests do or up or other anions

1150
00:45:14,930 --> 00:45:09,120
I asked because it looks like cystines

1151
00:45:19,850 --> 00:45:14,940
quite hard to make early on in life so

1152
00:45:22,460 --> 00:45:19,860
Mike what what were the file-based amino

1153
00:45:28,520 --> 00:45:22,470
acids before cysteine there must have

1154
00:45:29,720 --> 00:45:28,530
been a file so what were they that we're

1155
00:45:31,970 --> 00:45:29,730
I mean we can make something

1156
00:45:34,120 --> 00:45:31,980
structurally anything we want we don't

1157
00:45:36,410 --> 00:45:34,130
have to use the natural alphabet here

1158
00:45:45,200 --> 00:45:36,420
what would you what would you suggest

1159
00:45:47,030 --> 00:45:45,210
there's a possibility okay I mean we

1160
00:45:49,730 --> 00:45:47,040
just need to make we need to make some

1161
00:45:51,890 --> 00:45:49,740
bonds between amino acids to make a ring

1162
00:45:55,040 --> 00:45:51,900
or a linear structure or maquette if you

1163
00:45:57,560 --> 00:45:55,050
will but but the point is we understand

1164
00:46:00,020 --> 00:45:57,570
of course that imidazoles and and and

1165
00:46:03,980 --> 00:46:00,030
dials and the cystines were are not

1166
00:46:05,450 --> 00:46:03,990
found in contradict meteorites so what

1167
00:46:11,750 --> 00:46:05,460
would you but there were hydrogen

1168
00:46:13,490 --> 00:46:11,760
sulfide so so what do you give us what

1169
00:46:22,970 --> 00:46:13,500
do you get to play with

1170
00:46:27,010 --> 00:46:22,980
just-just-just glycine my favorites are

1171
00:46:30,140 --> 00:46:27,020
alanine especially and asparagine and

1172
00:46:31,730 --> 00:46:30,150
aubergine okay and probably aspartate

1173
00:46:34,100 --> 00:46:31,740
okay and I think I'm kind of stuck with

1174
00:46:36,200 --> 00:46:34,110
those three okay so so we like us part

1175
00:46:37,760 --> 00:46:36,210
eight maybe not for iron sulfur but for

1176
00:46:41,150 --> 00:46:37,770
binding other types of metal clusters

1177
00:46:45,710 --> 00:46:41,160
definitely like manganese oxides for

1178
00:46:48,950 --> 00:46:45,720
example anyone have a last question for

1179
00:46:50,120 --> 00:46:48,960
the speaker in that case

1180
00:46:54,620 --> 00:46:50,130
thank you

1181
00:47:14,130 --> 00:46:54,630
[Applause]